All codes are given in the appendix under the corresponding part of the related questions with their explanations.

Task1

Part a

In this part, we visualize one instance from each class in a 3D scatter plot with the plotly package. We first take the cumulative sum of acceleration over time to transform this information to a velocity vector. Note that the purple-blue colors show the earlier stages in the time series whereas green-yellow color shows the datapoints towards the end of the series. For easier recognition, we can look in the direction of y to the plots where z is perpendicular and x is horizontal axis on the screen. Here are the plots corresponding to different gesture types:

Class 1: In this plot, we can recognize the gesture by remarking that the velocity is in the downright direction at the beginning than it changes its direction to the downleft after coming back to the starting point 0.

Class 2: In this plot, we clearly see that the velocity is in the upward direction at first, then to right, to the down and left respectively.

Class 3: In this plot, the movement (velocity actually) starts from the left and makes a small bound upwards before slowing down.

Class 4: This is the opposite of the movement in class 3.

Class 5 - 6: These are similar to classes 3 and 4. The direction is now downward to upward.

Class 7-8: We can see from the plots below that these correspond to a clockwise and a counter-clockwise movement.

Part b

In this part, we have two distance measure alternative: Euclidean and Manhattan. We see that for both alternatives, k = 1 gives the best accuracy in 10-fold cross-validation.

##    Method Klev      Accu
## 1:    knn    1 0.9564732
## 2:    knn    3 0.9531250
## 3:    knn    5 0.9531250
## 4:    knn    7 0.9453125
## 5:    knn    9 0.9375000
##    Method Klev      Accu
## 1:    knn    1 0.9430804
## 2:    knn    3 0.9419643
## 3:    knn    5 0.9375000
## 4:    knn    7 0.9363839
## 5:    knn    9 0.9285714

Part c

In this part, we first calculate the 8 by 8 confusion matrix for both distance measures with k=1. Runtime and the accuracy values can be seen below. The accuracy is higher for euclidean distance measure as expected (as data is 3 dimensional). The reason why predictions with knn model of manhattan distance took more time than the euclidean is the difference of calculation of the distance matrices in r. Test to train distance matrix calculation is more efficient for euclidean in knn function whereas manhattan distance matrix calculation by analogue function is more time consuming. The euclidean distance calculations with analogue package takes a very large amount of time. Manhattan distance calculation was also tried by using for loops instead of the analogue package however it performed worse in terms of run time.

##               predictions_euclidean
## observed_class   1   2   3   4   5   6   7   8
##              1 429   0   0   2   0   5   1   0
##              2   1 451   0   0   0   0   0   0
##              3   2   0 417   1  15  15   4   0
##              4   6   0   0 379  49  11   0   5
##              5   3   0  10   3 415   2   0   0
##              6   6   0   6  12  17 407   0   1
##              7   0   0   3   0   0   0 444   0
##              8   0   0   0   4   2   0   0 454
##               predictions_manhattan
## observed_class   1   2   3   4   5   6   7   8
##              1 431   0   0   2   0   3   1   0
##              2   1 450   1   0   0   0   0   0
##              3   0   0 410   1  16  22   5   0
##              4   2   0   0 379  48  14   0   7
##              5   2   0   6   7 415   3   0   0
##              6  10   0  12  15  30 382   0   0
##              7   1   0   5   0   0   1 440   0
##              8   0   0   0   3   4   0   0 453
## [1] "Runtime for manhattan distances is:"
## Time difference of 1.085912 mins
## [1] "Runtime for euclidean distances is:"
## Time difference of 2.065118 secs
## [1] "Accuracy of the predictions with euclidean distance: "
## [1] 0.9480737
## [1] "Accuracy of the predictions with manhattan distance: "
## [1] 0.9380235

Task 2

Part a

Package named “penalized” was used for this task. There is no need to scale the data as the units are the same for all time indexes. 0 was assigned to -1 values in the data. Lambda values (Lambda1 and Lambda2) were found using inherent methods of the package (optL1 and optL2). Threshold was chosen as 0.6 by looking at the performance object of ROCR package and trial & error (cuts can be seen below). My aim was to maximize the accuracy so I did focus on minimizing the false negative rate (if class 1 shows an heart attack, false negative rate should have been kept at the minimum possible level to diminish the risk). CV folds were not stratified as this parameter was absent in “penalized” packages’s cv function. The accuracy is 83%.

##          cut       fpr      tpr
## 67 0.5706726 0.2500000 0.890625
## 68 0.5620297 0.2777778 0.890625
## 66 0.6022946 0.2500000 0.875000
## 65 0.6271219 0.2500000 0.859375
## 64 0.6417844 0.2500000 0.843750
## 63 0.6540222 0.2500000 0.828125
##          
## testclass  0  1
##         0 27  9
##         1  8 56

Part b

We plot 2 instances of class 1 and 0 & coefficients of the fused lasso on the same plot. From the graph below, we can say that the coefficients are chosen to be different than 0 in the intervals where 2 time series differ/change the most. The consecutive coefficents are mostly equal due to fused lasso penalties. There is also feature selection since most of the coefficients are zero valued.

Part c

In this part, the consecutive differences were calculated and the same steps were applied as in part a. Threshold was chosen to be 0.525 this time. The accuracy increased to 87% with this transformation, both false positives and false negatives decreased.

##          cut       fpr      tpr
## 66 0.5255885 0.1944444 0.906250
## 67 0.5227596 0.2222222 0.906250
## 65 0.5270556 0.1944444 0.890625
## 64 0.5293752 0.1944444 0.875000
## 63 0.5707328 0.1944444 0.859375
## 62 0.5915024 0.1944444 0.843750
##          
## testclass  0  1
##         0 29  7
##         1  6 58

Part d

We see that the coefficients became larger but “aggregated” and smoothed versions of the ones in part b. Fewer coefficients became non-zero (they are like the sum of the old ones). We can deduce that the consecutive differences in this data are more meaningful than the separate time points. The early decrease at time index around 10 followed by high rates around 40 are captured. The consecutive differences smoothed the “steps” of the plot in part b.